Optimal Sequential Exploration: Bandits, Clairvoyants, and Wildcats

نویسندگان

David B. Brown

James E. Smith

چکیده

This paper was motivated by the problem of developing an optimal strategy for exploring a large oil and gas field in the North Sea. Where should we drill first? Where do we drill next? The problem resembles a classical multiarmed bandit problem, but probabilistic dependence plays a key role: outcomes at drilled sites reveal information about neighboring targets. Good exploration strategies will take advantage of this information as it is revealed. We develop heuristic policies for sequential exploration problems and complement these heuristics with upper bounds on the performance of an optimal policy. We begin by grouping the targets into clusters of manageable size. The heuristics are derived from a model that treats these clusters as independent. The upper bounds are given by assuming each cluster has perfect information about the results from all other clusters. The analysis relies heavily on results for bandit superprocesses, a generalization of the classical multiarmed bandit problem. We evaluate the heuristics and bounds using Monte Carlo simulation and, in our problem, we find that the heuristic policies are nearly optimal. Subject Classifications: Dynamic Programming, Multiarmed Bandits, Bandit Superprocesses, Information Relaxations. ∗We are grateful to Jo Eidsvik and Gabriele Martinelli for sparking our interest in this problem and for sharing their detailed model for our motivating example.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimal Sequential Exploration: Bandits, Clairvoyants, and Wildcats (Online Appendix)

متن کامل

Correlational Dueling Bandits with Application to Clinical Treatment in Large Decision Spaces

We consider sequential decision making under uncertainty, where the goal is to optimize over a large decision space using noisy comparative feedback. This problem can be formulated as a Karmed Dueling Bandits problem where K is the total number of decisions. When K is very large, existing dueling bandits algorithms suffer huge cumulative regret before converging on the optimal arm. This paper s...

متن کامل

Exploration-Free Policies in Dynamic Pricing and Online Decision-Making

Growing availability of data has enabled practitioners to tailor decisions at the individuallevel. This involves learning a model of decision outcomes conditional on individual-specific covariates or features. Recently, contextual bandits have been introduced as a framework to study these online and sequential decision making problems. This literature predominantly focuses on algorithms that ba...

متن کامل

On Interruptible Pure Exploration in Multi-Armed Bandits

Interruptible pure exploration in multi-armed bandits (MABs) is a key component of Monte-Carlo tree search algorithms for sequential decision problems. We introduce Discriminative Bucketing (DB), a novel family of strategies for pure exploration in MABs, which allows for adapting recent advances in non-interruptible strategies to the interruptible setting, while guaranteeing exponential-rate pe...

متن کامل

The End of Optimism

Stochastic linear bandits are a natural and simple generalisation of finite-armed bandits with numerous practical applications. Current approaches focus on generalising existing techniques for finite-armed bandits, notably the optimism principle and Thompson sampling. Prior analysis has mostly focussed on the worst-case setting. We analyse the asymptotic regret and show matching upper and lower...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Operations Research

دوره 61 شماره

صفحات -

تاریخ انتشار 2013

Optimal Sequential Exploration: Bandits, Clairvoyants, and Wildcats

نویسندگان

چکیده

منابع مشابه

Optimal Sequential Exploration: Bandits, Clairvoyants, and Wildcats (Online Appendix)

Correlational Dueling Bandits with Application to Clinical Treatment in Large Decision Spaces

Exploration-Free Policies in Dynamic Pricing and Online Decision-Making

On Interruptible Pure Exploration in Multi-Armed Bandits

The End of Optimism

عنوان ژورنال:

اشتراک گذاری